Research and Realization of the Extensible Data Cleaning Framework
نویسنده
چکیده
This paper proposes the idea of establishing an extensible data cleaning framework which is based on the key technology of data cleaning, and the framework includes open rules library and algorithms library. This paper gives the descriptions of model principle and working process of the extensible data cleaning framework, and the validity of the framework is verified by experiment. When the data are being cleaning, all the errors in the data source can be cleaned according to the specific business by the predefined rules of the cleaning and choosing the appropriate algorithm. The last stage of the realization initially completes the basic functions of data cleaning module in the framework, and the framework which has ood efficiency and operation effect is verified by the experiment.
منابع مشابه
An Extensible Framework for Data Cleaning
We propose an extensible data cleaning tool, named AJAX, that supports the specification and efficient execution of complex data cleaning programs.
متن کاملDeclarative Support for Sensor Data Cleaning
Pervasive applications rely on data captured from the physical world through sensor devices. Data provided by these devices, however, tend to be unreliable. The data must, therefore, be cleaned before an application can make use of them, leading to additional complexity for application development and deployment. Here we present Extensible Sensor stream Processing (ESP), a framework for buildin...
متن کاملXML based Framework for ETL Processes For Relational Databases
In Data Warehousing, Extraction-Transformation-Loading (ETL) are the key tasks that are responsible for the extraction of data from several sources, their cleansing, customization and insertion into data warehouse [10]. More specifically ETL tools are category of specialized tools with the task of dealing with data warehouse cleaning and loading problems. These task are very critical in every d...
متن کاملTAILOR: A Record Linkage Tool Box
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...
متن کاملParametric study of a viscoelastic RANS turbulence model in the fully developed channel flow
One of the newest of viscoelastic RANS turbulence models for drag reducing channel flow with polymer additives is studied in different flow and rheological properties. In this model, finitely extensible nonlinear elastic-Peterlin (FENE-P) constitutive model is used to describe the viscoelastic effect of polymer solution and turbulence model is developed in the k-ϵ-(ν^2 ) ̅-f framework. The geome...
متن کامل